---
id: openai-agents
title: OpenAI Agents
sidebar_label: OpenAI Agents
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import { Timeline, TimelineItem } from "@site/src/components/Timeline";
import VideoDisplayer from "@site/src/components/VideoDisplayer";
import ColabButton from "@site/src/components/ColabButton";

# OpenAI Agents

[OpenAI Agents](https://openai.com/agents/) is a framework for building agents that can perform tasks.

## End-to-End Evals

`deepeval` allows you to evaluate OpenAI Agents **under a minute**.

<Timeline>

<TimelineItem title="Configure OpenAI Agents">

```python title="main.py" showLineNumbers
import os
from agents import Runner, add_trace_processor
from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
from deepeval.metrics import AnswerRelevancyMetric
from tests.test_integrations.utils import assert_trace_json, generate_trace_json

add_trace_processor(DeepEvalTracingProcessor())

weather_agent = Agent(
    name="Weather Agent",
    instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
    agent_metrics=[AnswerRelevancyMetric()],
)
```

:::info
Evaluations are supported for OpenAI Agents. Only metrics with parameters `input` and `output` are eligible for evaluation.
:::

</TimelineItem>
<TimelineItem title="Run evaluations">

Create an `EvaluationDataset` and invoke your OpenAI Agent for each golden within the `evals_iterator()` loop to run end-to-end evaluations.

<Tabs groupId="openai-agents">
<TabItem value="synchronous" label="Synchronous">

```python title="main.py" showLineNumbers
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in UK?"),
        Golden(input="What's the weather in France?"),
    ]
)

for golden in dataset.evals_iterator():
    Runner.run_sync(weather_agent, golden.input)
```

</TabItem>
<TabItem value="asynchronous" label="Asynchronous">

```python title="main.py" showLineNumbers
import asyncio
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in UK?"),
        Golden(input="What's the weather in France?"),
    ]
)

for golden in dataset.evals_iterator():
    task = asyncio.create_task(Runner.run(weather_agent, golden.input))
    dataset.evaluate(task)
```
</TabItem>

</Tabs>

✅ Done. The `evals_iterator` will automatically generate a test run with individual evaluation traces for each golden.

</TimelineItem>
</Timeline>